A Systematic Measurement of the Influence of Non-Uniform Cache Sharing on the Performance of Modern Multithreaded Programs
نویسندگان
چکیده
Most modern Chip Multiprocessors (CMP) feature shared cache on chip, whose influence on the performance of multithreaded programs, unfortunately, remains unclear due to the limited coverage of the deciding factors in prior studies. In this work, we conduct a systematic measurement of the influence using a recently released CMP benchmark suite, PARSEC, with a spectrum of factors considered. The measurement shows that, contrary to previous observations in multiprogramming environments and server programs, shared cache has insignificant influence on most of the programs, regardless of variations in input, thread assignment, and other factors. It suggests the mismatch between current multithreaded applications and CMP architectures, the limited potential of thread co-scheduling on those programs, and the necessity of program-level transformations for effective exploitations of modern CMP architectures. We apply a shared-cache-aware transformation to a data-mining program and produce over 30% speedup, preliminarily verifying the potential of program-level transformations.
منابع مشابه
Brief Announcement: Parallel Depth First vs. Work Stealing Schedulers on CMP Architectures
1. ABSTRACT In chip multiprocessors (CMPs), limiting the number of off-chip cache misses is crucial for good performance. Many multithreaded programs provide opportunities for constructive cache sharing, in which concurrently scheduled threads share a largely overlapping working set. In this brief announcement, we highlight our ongoing study [4] comparing the performance of two schedulers desig...
متن کاملExecution And Cache Performance Of A Decoupled Non-Blocking Multithreaded Architecture
In this paper we will present an evaluation of the execution performance and cache behavior of a new multithreaded architecture being investigated by the authors. Our architecture uses non-blocking multithreaded model based on dataflow paradigm. In addition, all memory accesses are decoupled from the thread execution. Data is pre-loaded into the thread context (registers), and all results are p...
متن کاملDTHREADS: Efficient and Deterministic Multithreading
Multithreaded programming is notoriously difficult to get right. A key problem is non-determinism, which complicates debugging, testing, and reproducing errors in multithreaded applications. One way to simplify multithreaded programming is to enforce deterministic execution. However, past deterministic systems are incomplete or impractical. Language-based approaches require programmers to write...
متن کاملComparative Review of the Performance Based Design of Building Structures Using Static Non-Linear Analysis, Part A: Steel Braced Frames
The objective of this review to be submitted in two independent parts, for steel frames and for RC frames, is to compare their structural performance with respect to the proposed N2-method, and so also of the consequent convenience of using pushover methodology for the seismic analysis of these structures. A preliminary investigation is presented on a pushover analysis used for the seismic perf...
متن کاملComparative Review of the Performance Based Design of Building Structures Using Static Non-Linear Analysis, Part B: R/C Frames
The objective of this review to be submitted in two independent parts, for steel frames and for RC frames, is to compare their structural performance with respect to the proposed N2-method, and so also of the consequent convenience of using pushover methodology for the seismic analysis of these structures. A preliminary investigation is presented on a pushover analysis used for the seismic perf...
متن کامل